A Unified View of Spectral Clustering∗
نویسندگان
چکیده
We formulate a discrete optimization problem that leads to a simple and informative derivation of a widely used class of spectral clustering algorithms. Regarding the algorithms as attempting to bi-partition a weighted graph with N vertices, our derivation indicates that they are inherently tuned to tolerate all partitions into two non-empty sets, independently of the cardinality of the two sets. This approach also helps to explain the difference in behavior observed between methods based on the unnormalized and normalized graph Laplacian. We also give a direct explanation of why Laplacian eigenvectors beyond the Fielder vector may contain fine-detail information of relevance to clustering. Another advantage of our discrete formulation is that it admits a random graph interpretation, showing that spectral clustering may be viewed as maximum likelihood partitioning under the assumption that the data is an instance of a graph with random edge weights. The resulting distribution on the weights formalizes and quantifies the intuitive notion that vertices in the same cluster are more likely to have high weights than vertices in different clusters. Numerical experiments that illustrate the analysis are included. keywords: balancing threshold, Rayleigh-Ritz Theorem, Fiedler vector, graph Laplacian, random graph, maximum likelihood, partitioning. AMS Subject Classification: 65F15, 90C27, 05C85 This manuscript appears as University of Strathclyde Mathematics Research Report 02 (2004). Department of Mathematics, University of Strathclyde, Glasgow G1 1XH, UK. Supported by Research Fellowships from The Leverhulme Trust and The Royal Society of Edinburgh/Scottish Executive Education and Lifelong Learning Department. Department of Mathematics, University of Turku, FIN-20014 Turku, Finland. Supported by the Academy of Finland, under grant number 53441.
منابع مشابه
A Least-Squares Unified View of PCA, LDA, CCA and Spectral Graph Methods
Over the last century Component Analysis (CA) methods such as Principal Component Analysis (PCA), Linear Discriminant Analysis (LDA), Canonical Correlation Analysis (CCA) and Spectral Clustering (SC) have been extensively used as a feature extraction step for modeling, classification, visualization, and clustering. This paper proposes a unified framework to formulate PCA, LDA, CCA, and SC as a ...
متن کاملFrom Ensemble Clustering to Multi-View Clustering
Multi-View Clustering (MVC) aims to find the cluster structure shared by multiple views of a particular dataset. Existing MVC methods mainly integrate the raw data from different views, while ignoring the high-level information. Thus, their performance may degrade due to the conflict between heterogeneous features and the noises existing in each individual view. To overcome this problem, we pro...
متن کاملA Unified Framework for Discrete Spectral Clustering
Spectral clustering has been playing a vital role in various research areas. Most traditional spectral clustering algorithms comprise two independent stages (i.e., first learning continuous labels and then rounding the learned labels into discrete ones), which may lead to severe information loss and performance degradation. In this work, we study how to achieve discrete clustering as well as re...
متن کاملA Unified View of Kernel k-means, Spectral Clustering and Graph Cuts
Recently, a variety of clustering algorithms have been proposed to handle data that is not linearly separable. Spectral clustering and kernel k -means are two such methods that are seemingly quite different. In this paper, we show that a general weighted kernel k -means objective is mathematically equivalent to a weighted graph partitioning objective. Special cases of this graph partitioning ob...
متن کاملImagerank: spectral techniques for structural analysis of image database
Drawing on the correspondence between spectral clustering, spectral dimensionality reduction, and the connections to the Markov Chain theory, we present a novel unified framework for structural analysis of image database using spectral techniques. The framework provides a computationally eficient approach to both clustering and dimensionality reduction, or 2-D visualization. Within this framewo...
متن کاملGuided Co-training for Large-Scale Multi-View Spectral Clustering
In many real-world applications, we have access to multiple views of the data, each of which characterizes the data from a distinct aspect. Several previous algorithms have demonstrated that one can achieve better clustering accuracy by integrating information from all views appropriately than using only an individual view. Owing to the effectiveness of spectral clustering, many multi-view clus...
متن کامل